Nonconvergence to saddle boundary points under perturbed reinforcement learning

نویسندگان

  • Georgios C. Chasparis
  • Jeff S. Shamma
  • Anders Rantzer
چکیده

For several classes of reinforcement learning schemes, convergence to action profiles that are not Nash equilibria may occur with positive probability under certain conditions on the payoff function. In this paper, we explore how an alternative reinforcement learning scheme, where the strategy of each agent is also perturbed by a strategy-dependent perturbation (or mutations) function, may exclude convergence to non-Nash pure strategy profiles. This approach extends prior analysis on reinforcement learning in games that addresses the issue of convergence to saddle boundary points. It further provides a framework under which the effect of mutations can be analyzed in the context of reinforcement learning. JEL classifications: C72, C73, D83

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to Escape Saddle Points Efficiently

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost “dimension-free”). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are...

متن کامل

Multiple solutions for a perturbed Navier boundary value problem involving the $p$-biharmonic

The aim of this article is to establish the existence of at least three‎ ‎solutions for a perturbed $p$-biharmonic equation depending on two‎ ‎real parameters‎. ‎The approach is based on variational methods‎.

متن کامل

Attainability of boundary points under reinforcement learning

This paper investigates the properties of the most common form of reinforcement learning (the “basic model” of Erev and Roth, American Economic Review, 88, 848-881, 1998). Stochastic approximation theory has been used to analyse the local stability of fixed points under this learning process. However, as we show, when such points are on the boundary of the state space, for example, pure strateg...

متن کامل

Convergence Analysis of a Randomly Perturbed Infomax Algorithm for Blind Source Separation

We present a novel variation of the well-known infomax algorithm of blind source separation. Under natural gradient descent, the infomax algorithm converges to a stationary point of a limiting ordinary differential equation. However, due to the presence of saddle points or local minima of the corresponding likelihood function, the algorithm may be trapped around these “bad” stationary points fo...

متن کامل

A method based on the meshless approach for singularly perturbed differential-difference equations with Boundary layers

In this paper, an effective procedure based on coordinate stretching and radial basis functions (RBFs) collocation method is applied to solve singularly perturbed differential-difference equations with layer behavior. It is well known that if the boundary layer is very small, for good resolution of the numerical solution at least one of the collocation points must lie in the boundary layer. In ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Game Theory

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2015